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Finding paths in sparse random graphs reqnires many queries 

Asaf Ferber * Michael Krivelevichl' Benny Sudakov ^ Pedro Vieira ^ 


Abstract 


We discuss a new algorithmic type of problem in random graphs studying the minimum number 
of queries one has to ask about adjacency between pairs of vertices of a random graph G ~ Q{n,p) 
in order to find a subgraph which possesses some target property with high probability. In this 
paper we focus on finding long paths in G ^ Q(n,p) when p = for some fixed constant e > 0. 
This random graph is known to have typically linearly long paths. 

To have £ edges with high probability in G ~ Q{n,p) one clearly needs to query at least P 
pairs of vertices. Can we find a path of length £ economically, i.e., by querying roughly that many 
pairs? We argue that this is not possible and one needs to query significantly more pairs. We prove 

( log(i) 

that any randomised algorithm which finds a path of length £ = fl ' —^ 


probability in G ~ Q{n,p) with p = must query at least VL 
is tight up to the log (i) factor. 


pe 


Fid] 


with at least constant 


pairs of vertices. This 


1 Introduction 

Let "P be a monotone increasing graph property (that is, a property of graphs that cannot be violated 
by adding edges). Suppose that the edge probability p = p{n) is chosen so that a random graph G 
drawn from the probability space Q(n,p) has V with high probability (whp). How many queries of the 
type “is (i,j) G E{G)?” are needed for an adaptive algorithm interacting with the probability space 
0{n,p) in order to reveal whp a subgraph G' G G possessing VI 

This fairly natural algorithmic setting (see the excellent survey of Frieze and McDiarmid |10j for 
an extensive coverage of a variety of problems and results in Algorithmic Theory of Random Graphs) 
has been considered implicitly in several papers on random graphs (e.g. HD, 0), but apparently has 
been stated explicitly only in the companion paper [S] of the authors. Notice that in this framework 
the issue of concern is not the amount of computation required for the algorithm to find a target 
structure, but rather the amount of its interaction with the underlying probability space. 

In the discussion below, we assume some basic familiarity with results about the probability space 
G{n,p); the reader is advised to consult monographs [H] and [6] for background on the subject. 

In general, given a monotone property V, what can we expect? If all n-vertex graphs belonging 
to V have at least m edges, then the algorithm should get at least m positive answers to hit the 
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target property with the required absolute certainty. This means that the obvious lower bound in this 
case is at least (1 + o{l))m/p queries. Perhaps one of the simplest graph properties to consider in 
this respect is connectedness: for any connected graph G on n vertices a spanning tree can be found 
after n — 1 queries with positive answers - the algorithm starts with an arbitrary vertex v G V{G), 
and each time queries the pairs leaving the current tree until the first edge is found, the tree is then 
updated by appending this edge. Thus for the regime where G{n,p) is whp connected (which is when 
> irin+u(n) lim„^oo<^(^) = 1); we get an algorithm whp discovering a spanning tree after 

querying (1 + o{l))n/p pairs of vertices. 

A much more challenging problem is that of Hamiltonicity, i.e., of finding a Hamilton cycle. In 
this case the trivial lower bound translates to n positive answers. In [9] we show that this lower bound 
is tight by providing an adaptive algorithm interacting with the probability space G{n,p), which whp 
hnds a Hamilton cycle in G ~ G{n,p) after obtaining only (1 + o(l))n positive answers (provided p is 
above the sharp threshold for Hamiltonicity in Q{n,p)). 

Yet another positive example is that of uncovering a giant component in the supercritical regime 
p = Though this was not the main concern in [U], the second and the third author presented 
there a very natural adaptive algorithm (essentially performing the Depth First Search (DFS) on a 
random input G ~ Q{n,p)), typically discovering a connected component of size at least ere/2 after 
querying er?I‘l vertex pairs. 

Upon reviewing these results, the reader may arrive at a conclusion that the above stated trivial 
lower bound for this type of problems is nearly tight for almost every natural graph property. However, 
this happens not to be the case, and the main qualitative goal of the present paper is to provide such a 
negative example, including its analysis. Here we focus on the property of containing a path of length 
i in the supercritical regime in G ~ Q{n,p), that is, when p = for some fixed constant e > 0. For 
this regime, G ~ Q{n,p) is known to contain whp a path of length linear in re, due to the classical 
result of Ajtai, Komlos and Szemeredi [3] (see m for a recent simple proof of this fact.) Note that 
in order to have I edges with high probability in G ~ ^(re,p) one needs to query at least ^ pairs 
of vertices. Can we find a path of length I by asking roughly that many queries, as in the case of 
Hamiltonicity mentioned above? We show that in this case one actually needs to query significantly 
more pairs of vertices: 

Theorem [H There exists an absolute constant G > 0 such that the following holds. For every constant 
q G (0,1) there exist reo,eo > 0 such that for every fixed e G (0,eo) o-'iT'd any n > uq there is no adaptive 
algorithm which reveals a path of length (i) with probability at least q in G ^ Q {n,p), where 

p = by querying at most of vertices. 

Notice that [H] presents a simple adaptive DFS algorithm, finding a path of length ^e^re with 
probability at least 1 — exp(D(ere)) in G ~ Q{n,p) after querying only O (ere^) pairs of vertices. In 
fact, if one uses the same algorithm to hnd a path of length i < \£^n in G ~ G{n,p) then the same 
argument shows that such a path is found with probability at least 1 — exp (D (|)) after querying at 

most O pairs of vertices. This shows that up to the 0 (log (i)) factor. Theorem [T] is tight. 

The key ingredient of the proof of Theorem [1] is the following result of independent interest. 

Theorem (2]. There exist constants G,eo > 0 such that for every fixed e G (0, eo) p = we 
have whp that a graph G ^ Q (n,p) does not contain a set of vertex disjoint paths of lengths at least 
^ In (i) whose union covers at least ISe^re vertices. 
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The rest of this paper is organised as follows. In Section [2] we provide auxiliary lemmas needed for 
the proofs of Theorem [1] and [2l In Section [3] we prove Theorem [1] assuming Theorem [2j In Section 0] 
we prove Theorem [2j Finally, in Section [5] we discuss some concluding remarks. 

Notation. Our notation is fairly standard. Given a natural number n we use [n] to denote the 
set {1,2,..., n}. Moreover, given a set V we use Sy to denote the permutation group of V and (^) 
to denote the set of all (unordered) pairs of elements in V. 

Given a subset S of the vertex set of a graph G, G[5] denotes the subgraph of G induced by the 
vertices in S, i.e. the graph with vertex set S whose edges are the ones of G between vertices in S. 

A subgraph F of the graph G is called a path if V{P) = {ui, ..., vg} and the edges of P are viV2, 
V 2 V 3 , ..., Vi-iVg. We shall oftentimes refer to P simply by . We say that such a path P has 

length i — 1 (number of edges) and size t (number of vertices). 

If G is a graph then the 2-core of G is the maximal induced subgraph of G of minimum degree at 
least 2. If no such subgraph exists then the 2-core of G is the empty graph. 

Given an ordered set V and a real number p E [0,1], the binomial random graph model Q(y,p) is a 
probability space whose ground set consists of all labeled graphs on the vertex set V. We can describe 
the probability distribution of G ~ G{V,p) by saying that each pair of elements of V forms an edge in 
G independently with probability p. lfV = [n] then we will abuse notation slightly and use Q[n,p) to 
refer to Q{[n],p). Given a property V (that is, a collection of graphs) and a function p = p{n) € [0,1], 
we say that G ~ Q{n,p) has V with high probability (or whp for brevity) if the probability that G € P 
tends to 1 as n tends to infinity. 


2 Auxiliary Lemmas 


2.1 Concentration ineqnalities 


We need to employ standard bounds on large deviations of random variables. The following well- 
known lemma due to Ghernoff (commonly known as the “Chernoff bound”) provides a bound on the 
upper tail of the Binomial distribution (see e.g. a, m)- 

Lemma 1. Let X ~ Bin(n,p) and let p, = ¥, [X]. Then Pr [X > (1 -|- a)p\ < e ^ for any 0 < a < |. 

The next lemma is a concentration inequality for the edge exposure martingale in G{n,p) which 
follows easily from Theorem 7.4.3 of [3]. 


Lemma 2. Suppose X is a random variable in the probability space Q[n,p) such that \X{G)—X{H)\ < 
G if G and H differ in one edge. Then 


Pr 


A - E All > GayG^ 


< 2e~ 


for any positive a 


< 2^/n^p. 


2.2 Galton-Watson trees and paths 

A Galton-Watson tree is a random rooted tree, constructed recursively from the root where each node 
has a random number of children and these random numbers are independent copies of some random 
variable ^ taking values in {0,1,2,...}. We let T denote a (random) Galton-Watson tree. We view 
the children of each node as arriving in some random order, so that P is an ordered, or plane tree. 
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We consider the conditioned Galton-Watson tree %, which is the random tree T conditioned on 
having exactly t vertices. In symbols, It := (T | \T\ = t), where, for any tree T, |T| denotes its number 
of vertices. 

For a rooted tree T, the depth h{v) of a vertex v is its distance to the root (in particular the root 
has depth 0). We define as usual the height of the rooted tree T by H{T) := max{/i(u) : v G T}. 
The following lemma which appears in [T] provides essentially optimal uniform sub-Gaussian upper 
tail bounds on for every offspring distribution ^ with hnite variance. 

Lemma 3. Suppose that E [^] = 1 and 0 < Var[,^] < oo. Then there exist constants C,c> 0 (which 
may depend on such that 

Pv[H{Tt)>h\<Cexp 

for all h >0 and t >1. 


As is well known, the distribution of the tree Tt is not changed if f is replaced by another random 
variable whose distribution is created from that of ^ by tilting or conjugation (see e.g. [I3]): if for 
every k > 0 we have Pr = k] = c 'Pr = k] for some p > 0 and normalizing constant c'. Thus, we 
see that Lemma [3] remains true for ^ ~ Poisson(^), with /i > 0, in which case the parameters C,c> 0 
are universal constants which do not depend on the parameter p. It is also well known (see e.g. Section 
6.4 of [7]) that if ~ Poisson(/z) then % is distributed as a random rooted labelled tree, that is, a tree 
picked uniformly from the trees on vertices {1, 2,... , t} in which one vertex is declared to be the 
root. From this we obtain an estimate to be used by us later. 


Lemma 4. Given 0 <i <t let pt/ denote the proportion of (rooted) labeled trees on t vertices which 
contain a path of length at least L There exist constants C,eo >0 such that for any e G (0, Eq) */ 
£ = jln (i) and to = ^ In (i) then 


E 

e<t<to 


Pt,i < £ 


Proof of Lemma It follows from Lemma [3] and the considerations above that there exist constants 
C',c' > 0 such that for every t < to'. 


Pt,i < C exp 



< C' exp 




C'e- 


Thus, if C > y ^ and if Eq is sufficiently small then we see that for any e G (0, Eq) and for t < to we 
have pte<£^- Using this we conclude that 


^ Pt,i <£^-to = ISe"^ In Q'j < E^ , 
e<t<to ^ ^ 


provided eq is sufficiently small, as claimed. 


□ 


The next lemma concerns the sizes of Poisson Galton-Watson trees which contain long paths. 
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Lemma 5. For e > 0 let 0 < fx < 1 be such that fie ^ = (1 + e)e Given I > 1 consider a 

Poisson{fi)-Galton-Watson tree T and the random variable 

^ _ r|7~| ifT contains a path of length at least | 

\ 0 otherwise , 


where \T\ denotes the number of vertices of T- Then there exist constants C,eo >0 such that for 
every e G (0, eo) and for £ = ^ In (i) we have E [T^] < 14e^ and Var[Ti] < 

Proof. We have 

E [Ti] = E [E [Ti I in]] = Pr [|r| = t] • E [T, | |r| = t] . (1) 

t>i 

It is well-known (see, e.g., Section 6.6 of [7]) that the size of the Poisson(//)-Galton-Watson tree T 
follows a Borel(/i) distribution, namely, 


Pr[|r|=t] 


T ^ {fie 
fi ■ t\ 


Moreover, as discussed in the remarks that follow Lemma [3l if we condition a Poisson(^)-Galton- 
Watson tree on it having exactly t vertices then it is identically distributed to a random rooted 
labelled tree on t vertices. Thus, it follows that E [Ti \ |T| = t] is equal to t ■ p, i, where p. t denotes 

r,3 c, 3 

the proportion of rooted labeled trees on t vertices which contain a path of length at least |. Hence, 
setting to ■= ^ (e) with foresight, it follows from ([1]) that 


in’ rrp 1 ^ 

t>i ^ 

1 t* 




t> g 

>1 
- 3 

( 


t>- 

3 


< 2 • 


E ^i.i + E 


e 3^ 


( 2 ) 


where in the second inequality we used the facts that ^ < e*, that (1 + e)* < (which holds 

since the first terms of the Taylor series expansion of ln(l -t- e) are e — and that ^ < 2 provided Eq 
is chosen sufficiently small. By Lemma 0] there exist constants C, eg > 0 such that the first sum in Q 
is at most Moreover, the second sum in ([2]) is 



t>to 



<e^ 



(3) 


where we used the fact that ^ 
the Taylor series expansion of e~^ 
C,So > 0 such that 


< I for X > 0 sufficiently small (which holds since the first terms of 
are 1 — x). Thus, all in all, we conclude that there exist constants 

E [T(\ <2 - {e^ + Q£^) = 14e^ 
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as claimed. Since |T| ~ Borel(/i) it follows that 


Var [T,] < E [T^] < E [\r\^] = 

Morever, it is straightforward to check that if Sq is chosen sufficiently small then ^ < 1 — §• Thus, we 
conclude that 

Var[r,]<| 

as claimed. □ 


Lemma 6. Let P = {V, E) he a path of length £ and B Q E a set of size \B\ < ai, where aPj. Let 
Q denote the graph obtained from P by deleting all the edges in B. Then there exist vertex disjoint 
subpaths of Q such that each has length at least ^ and the subpaths {Q^ji^j cover at 

least — a) £ vertices ofV. 

Proof of Lemma\^ Since P is a path, Q consists of a union of vertex disjoint paths {Q^}j^[k] for some 
k < \B\ + 1 < a£ + 1. Denoting by £j the length of the path for j G [k], note that 


Moreover, setting J := 


'^£j=£-\B\ > {l-a)£. 
i6[fc] 

{j G [k] £j > we see that 


E 


^j<k- 


3a 


1 . 1 2 

< -£-\ -< -1 

“3 3a “ 3 


Putting dl]) and ([5]) together 


wfi TOt, that 



(4) 


(5) 


Thus, it follows that the paths {Q^}j£j satisfy the desired conditions. 


□ 


2.3 Properties of random graphs 

The next lemma provides bounds on the sizes of the largest and second largest connected components 
of G ~ G{n,p) as well as the size of its 2-core when p = 4^^ where e > 0 is a small constant. This 
lemma is a simple consequence of Theorem 5.4 of m and Theorem 3 of US]. 

Lemma 7. Let p = 4^ where £> 0 is a constant. Then there exists a constant eo > 0 such that for 
every e G (0,eo) the following holds whp for G Gin,p): 

(a) the largest connected component of G has between en and Sen vertices. 

(b) the second largest connected component of G has at most In n vertices. 

(c) the 2-core of the largest connected component of G has at most 2e‘^n vertices. 
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In [ 8 ], Ding, Lubetzky and Peres established a complete characterization of the structure of the 
giant component Ci of G ~ 0{n,p) in the strictly supercritical regime {p = with e > 0 constant). 
This was achieved by offering a tractable contiguous model Ci, i.e. a model such that every graph 
property that is satished by Ci whp is also satished by Ci whp. In their model, Ci consists of a 2-core 

-fo) '"{2) 

C\ where one attaches to each vertex of C\ one independent Poisson (//)-Galton-Watson tree (where 
0 < ^ < 1 is such that pLe~^ = (1 + In light of this, any graph property that is satished 

whp by the disjoint union of |C) '^| independent Poisson(/r)-Galton-Watson trees must also be satished 
f 2') 

whp by Cl \ C) , the graph obtained from the giant component Ci by removing the edges of its 2 -core 
i2') 

C\ . As one would expect, the random variable |C) is tightly concentrated around its expectation, 
which agrees with the expected size of the 2-core of Ci. By (c) of Lemma [7] this at most 2e^n. The 
next technical lemma which will be useful in the proof of Theorem [2] follows from the considerations 
above. 

Lemma 8. Let Ci denote the largest connected component of G ^ Q{n,p) for p = where e > 0 
( 2 ) ( 2 ) 

is fixed, let C{ denote its 2-core and let Ci \ C{ denote the graph obtained from Ci by removing 
the edges in Let 0 < p < 1 be such that /ie“^ = (1 -|- and consider 2e^n independent 

Poisson{p)-Galton-Watson trees Ti, . ■ ■ , 72 e 2 „. Then, for every i and m (which might depend on n) if 
whp the disjoint union o/71,.. . , 72 £ 2 „ does not contain a set of vertex disjoint paths of length at least 
i covering at least m vertices then the same holds whp for Ci \ C) . 


3 Proof of Theorem [T] 


We start this section by repeating the statement of Theorem [T] for the reader’s convenience. 


Theorem 1. There exists an absolute constant G > 0 such that the following holds. For every constant 
q G (0, 1 ) there exist no,eo > 0 such that for every fixed e G (0,eo) 0^2/ n > uq there is no adaptive 

algorithm which reveals a path of length ^ ^ In (i) with probability at least q in G ^ Q {n,p), where 

p = by querying at most gg 4 Q(gpgip^i^ pairs of vertices. 

Proof of Theorem [11 Suppose Alg is an adaptive algorithm which with probability at least q hnds a 
path of length i in G ^ Q {n,p), where p = after querying at most pairs of vertices. 

We consider implicitly that Alg takes an ordered vertex set as part of its input. We shah assume 
henceforth that n, C > 0 are sufficiently large and e > 0 is sufficiently small in order to obtain a 
contradiction. Note that, by restricting Alg to a set of n vertices, we get an algorithm which for any 
n' > n with probability at least q hnds in G' ~ Q{n',p) a path of length i after querying at most 
8640 c'^^i (^) vertices. We shall abuse notation slightly and call Alg to all these algorithms. 


Dehne n' : = 


^1 -|- ^ n, Vq := [n'], Iq := 0 and s := 4 ” . For i = 1,..., s do the following: 


• Apply Alg to Gi-i ~ G {Vi-i,p), where the vertices in Vi-i are permuted according to a per¬ 
mutation TTj G Svi_i chosen uniformly at random. Let Lj be the graph of all pairs of vertices 
queried and let Ki C Lj be the graph of edges present. By the algorithm we know that Lj has at 
most gg 4 Q(gpg 4 ^^i^ edges. If Ki contains a path of length i then let Pi be one such path, dehne 

Vi := Vi-i \ V{Pi) and set A := li-i U {i}. Otherwise, set V) := Vi-i and := li-i. 
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Observe that |t4l >n' — {l + l)s = n — n = n and so we can indeed apply Alg to 

Vi-i for any i G [s]. We define a random graph H with vertex set Vq in the following way. For every 
pair of vertices {u, v} C Vq if {u, n} G E{Li) for some i G [s] then let zq be the smallest such index and 
set {n, n} as an edge of H if and only if {u,v} G E^Ki^). Consider all the other pairs {u,v} C Vb as 
non-edges of H. From the procedure above it follows that for every {u, n} C Vq we have independently 
that 

1 -g £ 1-|-e n' (1 + e) (l + ™q 

Pr [{u, v} G E{H)] <p=^ = ^ -= -^^ 

n n' n n' n' 

provided e < Thus, the graph H can be viewed as a subgraph of a graph sampled from 

Q in', In particular, if with probability at least ^ the graph H contains a set of vertex disjoint 

paths of length at least ^ In (^) which cover at least 52e‘^n' vertices then the same must also hold with 

probability at least ^ in ^ (n', However, this would contradict Theorem [2] and so it suffices to 

prove the following claim; 


< 


1 + 2£ 


Claim. With probability at least ^ the graph H contains a set of vertex disjoint paths of length at 
least j In (i) which cover at least 52e^n' vertices ofVo. 

Define for each i & Is the graph Hi with vertex set Vj-i and edge set bl (^^*2 

note that 


\E{Hi)\<s^ 


q£ 


< 


en 


< 


Vi-i 

2 


( 6 ) 


8640Cpeln(i) “ 12C In (i) (1 + e) “ 6Cln(i) 

Observe that for each i & Is the set Fj_i \ Vi consists of the vertex set of a path Pi in the graph Ki. 
For each such i set Bi := E[Pi) n E[Hi) and let Qi denote the graph obtained from Pi by deleting all 
the edges in Bi. Note crucially that E{Qi) C E{H) and that the graphs {Qi}i^i^ are vertex disjoint. 


Consider now the set I := < i G /. 


|S‘I S 5c%< 


By Lemma [6] it follows that for any i G I 


there exist vertex disjoint subpaths of Qi each of length at least ^ In (^) which cover at least 

^ vertices ofV{Qi). Thus, if |/| > ^sq then {Qljjg/jgj. forms a collection of 

vertex disjoint paths in H of length at least ^ In (f) which cover at least ^{(.pVj-^sq = GOe^n > 52e^n' 
vertices of Vq. It suffices to show then that with probability at least ^ we have |/| > ^sq. 

Let I' := [s] \ L and note that for every i G [s] we have 

Pr [i G /'] = Pr [z ^ Is] + Pr [z G /' I z G /,] • Pr [i G /,] . (7) 

It is clear from the procedure above that for each z G [s] we have Pr [z G Is] > q- Note also crucially 
that, provided i ^ Is, the path Pi is a randomly mapped path of length i on the vertex set Vi-i. 
Indeed, this happens because before the z-th application of Alg we permuted the vertices of Vi-i 
according to a permutation TTj G chosen uniformly at random. Thus, by conditioning on the 

event that i ^ Is, on any possible graph Hi satisfying ([6]) and on the path iif^iPi), we have for any 
eeE{-K-\Pi)y 

Pr[7ri(e) GH(H,)] < 


and so, by linearity of expectation it follows that: 

E[|E(a)nE(if,)|] < 


6Cln(l)’ 


6Cln(i) ■ 


















Thus, by Markov’s inequality (see, e.g., S) we get that 


Pr [i e /' I i e Is] < 


1 

2 ’ 


and so by equation d?]) we see that for any i S [s] we have Pr [i E /'] < 1 — ^ Pr[i E /«] < 1 — It 
follows then by linearity of expectation that E[|I'|] < s (l — |). Hence, again by Markov’s inequality 
we conclude that 


Pr 


\I'\ > 


l + § 


<1 —— , which implies — < Pr 


1^1 > 


sq 

2 -\- q 


< Pr 


1^1 > 


sg 
3 J 


This completes the proof. 


□ 


4 Proof of Theorem [2] 

Theorem 2. There exist constants C,So > 0 such that for every fixed e E (0,eo) we have whp that 
G ^ G {n, does not contain a set of vertex disjoint paths of lengths at least E In (f) whose union 
covers at least 13e^n vertices. 

Proof of Theoreml^ Let G ~ G{n,p) where p = Let Ci denote the largest connected component 

of G, let C\ denote the 2-core of Ci and let Ci \ denote the graph obtained from Ci by deleting 

( 2 ) 

the edges in . For 1>1 consider the following random variables: 

• X£ = number of vertices which belong to connected components of G of size at most In n 
containing a path of length at least i. 

• Yi = maximum number of vertices covered by vertex disjoint paths of length at least I in Ci. 

• Zg = maximum number of vertices covered by vertex disjoint paths of length at least | in . 

By (b) of Lemma [7] it follows that whp \s an upper bound on the maximum number of 

vertices of G covered by vertex disjoint paths of length at least I. Note that we may assume that all 
the paths considered have size at most 21 by splitting larger paths into several paths of length at least 
l. Moreover, if P is a path of length at least I in C\ then, since Ci consists of a disjoint union of 
trees, there must exist a subpath P' of the path P with at least ^ | vertices which lies in or 

in Cl Since |P| < 6|P'| it follows that Yi < 6|Cp^| + 6Zi. 

By (c) of Lemma [7] we know that whp < 2e^n, provided eo is chosen small enough. It 

suffices then to show that there exist constants C,eo >0 such that for every fixed e E (0,eo) and for 
^ := E In (i) we have whp that 

Xi < e^n and Zi < 29e^n. 

since in that case we have whp that the maximum number of vertices of G covered by vertex disjoint 
paths of length at least i is at most 

Xi pYi < Xi Y 6\C^^ I + 6Zi < e^n + 6 • 2e^n -|- 6 • 29e^n < 13e^n. 

provided eq is chosen sufficiently small. Lemmas [9] and [TO] complete the proof. □ 
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Lemma 9. There exist constants C,eo >0 such that for every fixed e G (0,eo) and for I := (i) 

we have Xg < e^n whp. 

Proof of Lemma\^ Given a set 5 C [n] of size t, let S^{S) (resp. Te{S)) denote the set of possible 
connected graphs (resp. spanning trees) on the vertex set S which contain a path of length at least 
i. Let Xs denote the indicator random variable of the event that G[5'] G Si{S) and that there are no 
edges in G between S and [n] \ S. Note that G[S] G Si{S) if and only if there exists T G TiiS) such 
that T C G[5']. Thus, by the union bound we have 

E[X5] < \Te{S)\ (8) 


where the first term accounts for taking a union bound over all T G T(.{S), the second term accounts for 
the probability that the edges in T are present in ^[5"] and the last term accounts for the probability 
that none of the edges between S and [n] \ S are present in G. Note that |7£(S')| does not depend on 
the set S and is equal to the number of labeled trees on t vertices which contain a path of length at 
least 1. More specifically, if pt^£ denotes the proportion of labeled trees on t vertices which contain a 
path of length at least £, then 17^((S')! = pt^i ■ t^~‘^. Observe now that the random variable Xi satisfies 
the following; 

^Inn 

Xis'E E 

t=e 

We claim that for i := ^InQ), where G > 0 is a large constant, and for some constant eo > 0, 
if e G (0,eo) is fixed then Pr > e^n] = o(l). To prove this claim we start by estimating E[X£]. 

Setting to := ^ In (^), we have by the linearity of expectation and by ([8]) that if eo is sufficiently small 
then: 


^Inn 


20 

77 


t=i 
In n 


HXi] < ^ ^ ■ ( ? ) ■ PG • ^ • p* ^ • (1 - p) 






t=£ 

^Inn 


,t-2 


1 + e 


n 


t-i 


1 - 


1 + e 


n 




< 


< 


E 

t=i 

(1 + o(l))n 
(.(\ + e) 


n ■ Pt,e 




t>e 


1 + e 


e 3 


• e 


Sti'I E «.< + E 

vi<t<to t>to 


-eft 
e 3^ 


(9) 


where in the third inequality we used the fact that (1 + e)* < for sufficiently small e > 0. By 

Lemma 0] there exist constants C, eo > 0 such that the first sum in ([9]) is at most e^. Moreover, by ([3]) 
the second sum in Q is at most 6e^. Thus, all in all, we conclude that there exist constants C, eo > 0 
such that 


E[X,]<^.(e3 + 6e3) 


e^n 


10 







Note that if G and H differ in precisely one edge then \X£{G) — X£{H)\ < ^ Inn because one edge 
affects at most two connected components of size at most ^Inn. Thus, by Lemma [2] it follows that 


Pr \X£ > e^n] < Pr 




e^n 


< g ^V(lnn)^ 


= 0 ( 1 ). 


□ 


Remark. An alernative approach to the proof of Lemma[9]would be to invoke the so called symmetry 
rule (see, e.g.. Chapter 5.6 of m), postulating that in the supercritical regime p = the subgraph 
of G ~ G{n,p) outside the giant component behaves typically as a random graph with subcritical edge 
probability. One can then estimate the likely contribution of paths of length at least i = ^ In (^) 
coming from the small components to the total volume of vertex disjoint paths of length at least i 
and to show it to be 0{e^n) whp, using a direct first moment argument. Since we still need to treat 
the paths residing in the giant component outside the 2-core (the random variable Z^), we chose to 
adopt a unified approach using the machinery of Galton-Watson trees developed in Section 12.21 and 
to apply it here as well. 

Lemma 10. There exist constants G,eo > 0 such that for every fixed e € (0, eo) and for i := ^\n (i) 
we have Z£ < 29e®n whp. 

Proof of Lemma [TU. Recall that Z£ counts the maximum number of vertices covered by vertex disjoint 
paths of length at least | in Ci\C^\ Let 0 < /x < 1 be such that and consider 2e^n 

independent Poisson(/i)-Galton-Watson trees 7i,... ,72e2„. By Lemma [8] it suffices for our purposes 
to show that whp the maximum number of vertices covered by vertex disjoint paths of length at least 
I in the disjoint union of T),... ,T 2 ^ 2 n is less than 29e^n, for appropriate C,eo > 0. 

For each 1 < i < 2s‘^n consider the following random variable: 

^ _ {\Ti\ if Ti contains a path of length at least | 

' \ 0 otherwise 

and set = Y^i=i Clearly is an upperbound on the maximum number of vertices covered by 
vertex disjoint paths of length at least | in in the disjoint union of 71,... ,72e2„. To finish the proof, 
we show that whp < 29e^n, provided C, eo > 0 are chosen appropriately. 

By Lemma [5] we know that there exist constants G,£o >0 such that for every e G (0,eo) and for 
£ = C In (i) we have E < 14e^ and Var Thus, since the random variables Ti^£ are 

independent, we have that 

8 1 QtI/ 

E [T^] < 14e^ • 2e^n = 28e^n and Var [T^] < — • 2e^n =-. 

Thus, by Ghebyshev’s Inequality (see, e.g., 01) we conclude that 

Pr [r, > 29e"n] < Pr [|r, - E [T,] | > e^n] < < ^ = o{l). 

□ 
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5 Concluding remarks 


We have shown that in order to find a path of length i = Q 



m G ^ Q {n,p) with at least some 


constant probability, where p = with e > 0 fixed, one needs to query at least Q (^ pgiog^i^ J pairs 
of vertices. This is close to best possible since a randomised depth first search algorithm from m 


finds whp a path of length £ after querying at most O j pairs of vertices. A natural question, which 
remains open, is to close the gap between these bounds. We believe that every adaptive algorithm 
which reveals whp a path of length £ in G ~ Q{n,p), where p = with e > 0 fixed, has to query 


Q pairs of vertices. 

Recall that, to prove our main result, in Theorem [2] we bounded the total number of vertices 
covered by vertex disjoint paths of size at least n Q log (i)) in a typical graph sampled from 0{n,p), 
p = by O (e^n). Since a graph G ~ Q{n,p) contains whp a path of length 0(e^n) (see e.g. [TT]f . 
this is best possible up to a multiplicative constant. If one can show that a similar statement holds 
for paths of length Q (i) then one can modify our proof to obtain a (j^) bound in Theorem [T] 

In the proof of Theorem [2] we needed to bound the number of vertices covered by vertex disjoint 
paths of a prescribed length in a random tree of hxed size t (Lemma [5|). Our estimate was a bit 
wasteful because for trees which contained a path of length i we used their total number of vertices 
t instead of the number of vertices covered by vertex disjoint paths of length i, which is most likely 
significantly smaller. A way to fix this is to obtain good bounds for the following question: 


Question. Given a = a{t) G N and b = b{t) G N what is the probability that a random tree on t 
vertices contains b vertex disjoint paths, each of length at least a ? 

Note that, since the diameter of a random tree on t vertices is whp 0(\/t) (see e.g. m), the only 
interesting regime is when ab > Cy/t for some constant G > 0. Moreover, by splitting paths of length 
larger than 2a into smaller subpaths of length at least a, we may consider only paths of length between 
a and 2a. 

0ne possible approach to this problem would be through a nice argument of Joyal 1[12]. see also 
12]). It shows that a random tree T on t vertices can be obtained from a random map f ■ [t] ^ [t] 
as follows. First we create the directed graph D (possibly with loops) on vertex set [t] with edges 
i —)• f{i) for each z G [t]. Then we look at a maximal set of vertices M = {zi,... ,im} ^ [t] such 
that /|m is a permutation. We remove the directed edges inside M and replace them by the path 
/(zi) —>■ /(z 2 ) f{im) (where zi < Z 2 < ... < im)- By ignoring the orientations of the edges 

we obtain the desired tree T. Note that, since the vertices in M form a path in T, we must have 
\M\ = 0{y/i) whp. Moreover, if we have a path P in T then a moment’s thought reveals that either 
P has at least vertices in M or there are vertices of P which form a directed path in D. 

Thus, it follows that if we have a collection of b vertex disjoint paths in T each of length between a 
and 2a then D contains a collection of vertex disjoint directed paths each of length between and 
2a covering at least — \M\ vertices. Since \M\ = 0{y/i) whp and since we are interested only in 

the case when ab > Cy/t for some large constant G > 0, it follows that in that case we have, say, at 
least such paths. Thus, up to changing a and b by constant multiplicative factors, it is enough to 
estimate the probability that the directed graph D obtained from a random map f ■ [t] ^ [t] contains 
at least b vertex disjoint directed paths, each of length (at least) a. 
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We can give a simple upper bound on this probability by taking the union bound over all collections 
of b vertex disjoint directed paths of length a. This shows that the probability that we want to estimate 
is at most 


_^_ ( 1 ) 

{t - {a + 1 ) 6 )! 6 ! \t J 


b'. 


(a+l)fo—1 

n 


i=l 





Unfortunately, this upper bound is not strong enough to allow us to prove Theorem [2] for paths of 
length at least 0 (i) because when b is roughly a constant and a is close to y/i the positive term 
bln{t/b) in the exponent is much larger than the negative term Thus, it would be nice to 

obtain tighter bounds for the probability in question. 
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